Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling

نویسندگان

  • Ray-Bing Chen
  • Yaohung M. Tsai
  • Weichung Wang
چکیده

Article history: Received 14 March 2013 Received in revised form 7 December 2013 Accepted 6 March 2014 Available online 5 April 2014

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development...

متن کامل

GPU Acceleration of Small Dense Matrix Computation of the One-Sided Factorizations

Various scienti€c applications use Gaussian elimination or Cholesky or QR factorization to solve dense linear systems. For an important class of problems, a relatively large number of small size systems is generated and must be solved. Typically, the order of these linear systems is up to a few hundred, and their number is from a few thousand to millions. For example, subsurface transportation ...

متن کامل

A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

Aiming to fully exploit the computing power of all CPUs and all GPUs on hybrid CPU-GPU systems to solve dense linear algebra problems, we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, as well as to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentr...

متن کامل

Towards a multifrontal QR factorization for heterogeneous architectures over runtime systems

During the last decade, computer architectures for high performance computing have considerably evolved toward heterogeneous systems equipped with different types of computational units and a higher number of cores per chips. An example of popular heterogeneous architectures widely adopted in the high performance computing domain are GPU-based systems. In the work presented in this talk we stud...

متن کامل

Algorithm 9xx: Sparse QR Factorization on the GPU

Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge, and is up to eleven times faster than a highly optimized me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Computing

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2014